Assignment 28 - 4 March 2023 : Divya Pardeshi¶

Q1. Load the "titanic" dataset using the load_dataset function of seaborn. Use Plotly express to plot a scatter plot for age and fare columns in the titanic dataset.

Ans.

In [1]:
#importing the required libraries
import seaborn as sns
import plotly.express as px
In [2]:
# Load the titanic dataset
titanic_data = sns.load_dataset("titanic")
In [3]:
titanic_data.shape
Out[3]:
(891, 15)
In [4]:
titanic_data.head()
Out[4]:
survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alone
0 0 3 male 22.0 1 0 7.2500 S Third man True NaN Southampton no False
1 1 1 female 38.0 1 0 71.2833 C First woman False C Cherbourg yes False
2 1 3 female 26.0 0 0 7.9250 S Third woman False NaN Southampton yes True
3 1 1 female 35.0 1 0 53.1000 S First woman False C Southampton yes False
4 0 3 male 35.0 0 0 8.0500 S Third man True NaN Southampton no True
In [5]:
# Create the scatter plot using Plotly Express
fig = px.scatter(titanic_data, x="age", y="fare")

# Show the scatter plot
fig.show()

Q2. Using the tips dataset in the Plotly library, plot a box plot using Plotly express.

Ans.

In [6]:
import plotly.express as px
In [7]:
# Load the tips dataset
tips_data = px.data.tips()
In [8]:
tips_data.shape
Out[8]:
(244, 7)
In [9]:
tips_data.head()
Out[9]:
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4
In [10]:
# Create the box plot using Plotly Express
fig = px.box(tips_data, x="day", y="total_bill")

# Show the box plot
fig.show()

Q3. Using the tips dataset in the Plotly library, Plot a histogram for x= "sex" and y="total_bill" column in the tips dataset. Also, use the "smoker" column with the pattern_shape parameter and the "day" column with the color parameter.

Ans.

In [11]:
import plotly.express as px
In [12]:
# Load the tips dataset
tips_data = px.data.tips()
In [13]:
tips_data.shape
Out[13]:
(244, 7)
In [14]:
tips_data.head()
Out[14]:
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4
In [15]:
# Create the histogram using Plotly Express
fig = px.histogram(tips_data, x="sex", y="total_bill", color="day", pattern_shape="smoker")

# Show the histogram
fig.show()

Q4. Using the iris dataset in the Plotly library, Plot a scatter matrix plot, using the "species" column for the color parameter. Note: Use "sepal_length", "sepal_width", "petal_length", "petal_width" columns only with the dimensions parameter.

Ans.

In [16]:
import plotly.express as px
In [17]:
# Load the iris dataset
iris_data = px.data.iris()
In [18]:
iris_data.shape
Out[18]:
(150, 6)
In [19]:
iris_data.head()
Out[19]:
sepal_length sepal_width petal_length petal_width species species_id
0 5.1 3.5 1.4 0.2 setosa 1
1 4.9 3.0 1.4 0.2 setosa 1
2 4.7 3.2 1.3 0.2 setosa 1
3 4.6 3.1 1.5 0.2 setosa 1
4 5.0 3.6 1.4 0.2 setosa 1
In [20]:
# Create the scatter matrix plot using Plotly Express
fig = px.scatter_matrix(iris_data, dimensions=["sepal_width", "sepal_length", "petal_width", "petal_length"], color="species")

# Show the scatter matrix plot
fig.show()
C:\Users\swati\anaconda3\lib\site-packages\plotly\express\_core.py:279: FutureWarning:

iteritems is deprecated and will be removed in a future version. Use .items instead.

Q5. What is Distplot? Using Plotly express, plot a distplot.

Ans.

A Distplot or distribution plot, depicts the variation in the data distribution. Seaborn Distplot represents the overall distribution of continuous data variables. The Seaborn module along with the Matplotlib module is used to depict the distplot with different variations in it. The Distplot depicts the data by a histogram and a line in combination to it.

In [21]:
import plotly.express as px
import numpy as np

# Create some data
data = np.random.randn(1000)

# Create the distplot
fig = px.histogram(data, marginal="rug")

# Add a title
fig.update_layout(title_text="Distribution of Data")

# Show the plot
fig.show()

We can also plot distplot using below code:¶

In [22]:
import plotly.express as px
import numpy as np
import scipy.stats as stats

# Create a sample dataset
np.random.seed(2)
data = np.random.randn(1000)

# Create the histogram
fig = px.histogram(data, nbins=30, histnorm='probability density')

# Add the KDE curve
x_values = np.linspace(data.min(), data.max(), 100)
y_values = stats.gaussian_kde(data).pdf(x_values)
fig.add_scatter(x=x_values, y=y_values, mode='lines', name='KDE')

# Update the layout
fig.update_layout(title='Distribution Plot')

# Show the plot
fig.show()
In [ ]: